Basic Statistics

Raw Counts

Name Value
Rows 7,375
Columns 91
Discrete columns 44
Continuous columns 47
All missing columns 0
Missing observations 6,634
Complete Rows 2,660
Total observations 671,125
Memory allocation 3.9 Mb

Percentages

Data Structure

root (Classes 'data.table' and 'data.frame': 7375 obs. of 91 variables:)id (int)budget (int)original_language (Factor w/ 44 levels "af","ar","bm",)popularity (num)release_date (Date, format)Date.CONVERT (chr)runtime (num)runtime_cat (Factor w/ 3 levels "Large","Medium",)revenue (num)sw_lang_en (Factor w/ 2 levels "0","1")sw_web_presence (Factor w/ 2 levels "0","1")sw_has_poster (Factor w/ 2 levels "0","1")sw_tagline (Factor w/ 2 levels "0","1")keyword_cnt (int)release_year (num)release_month (num)high_release_month (Factor w/ 2 levels "0","1")release_day (num)seasonality (num)sw_collection (int)producers_cnt (num)countries_cnt (int)lang_US (Factor w/ 2 levels "0","1")lang_FR (Factor w/ 2 levels "0","1")lang_RU (Factor w/ 2 levels "0","1")lang_ES (Factor w/ 2 levels "0","1")lang_JA (Factor w/ 2 levels "0","1")keywords_cnt (int)actor0_movies_cnt (int)actor0_movies_5y_cnt (int)actor1_movies_cnt (int)actor1_movies_5y_cnt (int)actor2_movies_cnt (int)actor2_movies_5y_cnt (int)sw_female_actor0 (Factor w/ 3 levels "0","1","")sw_female_actor1 (Factor w/ 3 levels "0","1","")sw_female_actor2 (Factor w/ 3 levels "0","1","")sw_male_actor0 (Factor w/ 3 levels "0","1","")sw_male_actor1 (Factor w/ 3 levels "0","1","")sw_male_actor2 (Factor w/ 3 levels "0","1","")actor0_prev_revenue (num)actor1_prev_revenue (num)actor2_prev_revenue (num)director_movies_cnt (int)director_movies_5y_cnt (int)genre_adventure (Factor w/ 2 levels "0","1")genre_fantasy (Factor w/ 2 levels "0","1")genre_animation (Factor w/ 2 levels "0","1")genre_drama (Factor w/ 2 levels "0","1")genre_horror (Factor w/ 2 levels "0","1")genre_action (Factor w/ 2 levels "0","1")genre_comedy (Factor w/ 2 levels "0","1")genre_history (Factor w/ 2 levels "0","1")genre_western (Factor w/ 2 levels "0","1")genre_thriller (Factor w/ 2 levels "0","1")genre_crime (Factor w/ 2 levels "0","1")genre_documentary (Factor w/ 2 levels "0","1")genre_science_fiction (Factor w/ 2 levels "0","1")genre_mystery (Factor w/ 2 levels "0","1")genre_music (Factor w/ 2 levels "0","1")genre_romance (Factor w/ 2 levels "0","1")genre_family (Factor w/ 2 levels "0","1")genre_war (Factor w/ 2 levels "0","1")genre_foreign (Factor w/ 2 levels "0","1")depart_Art (num)depart_Camera (num)depart_Crew (num)depart_Custom_Mkup (num)depart_Directing (num)depart_Editing (num)depart_Lighting (num)depart_Production (num)depart_Sound (num)depart_Visual_Effects (num)depart_Writing (num)depart_Art_female (num)depart_Camera_female (num)depart_Crew_female (num)depart_Custom_Mkup_female (num)depart_Directing_female (num)depart_Editing_female (num)depart_Lighting_female (num)depart_Production_female (num)depart_Sound_female (num)depart_Visual_Effects_female (num)depart_Writing_female (num)friday (Factor w/ 2 levels "0","1")weekend (Factor w/ 2 levels "0","1")directors_cat (Factor w/ 3 levels "Star","medium",)productor_cat (Factor w/ 3 levels "Star","medium",)popular_film (Factor w/ 2 levels "popular","not popular")

Missing Data Profile

Univariate Distribution

Histogram

Bar Chart (with frequency)

## 2 columns ignored with more than 50 categories.
## release_date: 4697 categories
## Date.CONVERT: 4697 categories

QQ Plot

## Warning: Removed 598 rows containing non-finite values (stat_qq).
## Warning: Removed 598 rows containing non-finite values (stat_qq_line).

## Warning: Removed 49 rows containing non-finite values (stat_qq).
## Warning: Removed 49 rows containing non-finite values (stat_qq_line).

## Warning: Removed 95 rows containing non-finite values (stat_qq).
## Warning: Removed 95 rows containing non-finite values (stat_qq_line).

## Warning: Removed 36 rows containing non-finite values (stat_qq).
## Warning: Removed 36 rows containing non-finite values (stat_qq_line).

## Warning: Removed 36 rows containing non-finite values (stat_qq).

## Warning: Removed 36 rows containing non-finite values (stat_qq_line).

## Warning: Removed 8 rows containing non-finite values (stat_qq).
## Warning: Removed 8 rows containing non-finite values (stat_qq_line).

Correlation Analysis

## 3 features with more than 20 categories ignored!
## original_language: 31 categories
## release_date: 2186 categories
## Date.CONVERT: 2186 categories
## Warning in cor(x = structure(list(id = c(1L, 2L, 3L, 7L, 9L, 10L, 11L, 12L, :
## the standard deviation is zero

Principal Component Analysis

## 2 features with more than 50 categories ignored!
## release_date: 2186 categories
## Date.CONVERT: 2186 categories
## Warning in plot_prcomp(data = structure(list(id = c(1L, 2L, 3L, 7L, 9L, : The following features are dropped due to zero variance:
##  * actor2_movies_5y_cnt
##  * sw_has_poster_1